A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND TRANSFORMATION I. Introduction
نویسنده
چکیده
Survey data is often released as microdata. Survey respondents are thus subjected to the risk of reidenti f icat ion and disclosure of confidential data, even when identi fying information such as name and address is deleted prior to release of data. To avoid this disclosure problem, m easures of m asking the data have been proposed. They include adding random error, multiplying by random error, microaggregating, data swapping, random rounding, slicing and co m bining subrecords. Two reseachers compared those measures with respect to their masking capability and i m pact on key statistics. Specifically, Spruill (1983) performed an empirical study of comparison of additive random noise, mult ipl icative random error, microaggregation, random rounding and data swapping methods with regard to the effect of masking on key statistics. She also perform ed a reidentif ication experi m ant based on the distance measure of absolute deviation and squared deviation. Paass(1985) also performed a reidenti f icat ion experi m ant based on a refined m easure of ident i f icat ion including discriminant analysis. He found from his experiment that the addition of random error is not an effective measure and hence proposed new masking schemes such as slicing and subrecordsco m bination. As has been shown in both studies, some measures maintain the unbiased values of sum mary statistics such as mean and standard deviation but others lose the unbiasedness of the data. Also some schemes preserve the original structural relations and hence original causal relationships. However, others don%. According to Paass, the combination method which is best suitable for masking caused serious distortion of relationships among variables. This squarely puts us in the quandary as to whether or not we opt for protection in spite of grave sacrifice of usefulness of the data. From the users' point of view, maintenance of the usefulness of the data is the abiding require m ant for a good m asking sche m e. At the Bureau of the Census, we have been faced with masking microdata fi les. For masking earnings data, a new scheme has been developed. The scheme is a combination of random noise inoculation and transformation. In this paper I will describe this new measure and provide examples of application of the measure on the earnings data. Since multiple regression is the primary use of the earnings data, I will discuss the theoretical effects of masking on the regression. It should be mentioned that the power of l imi t ing the disclosure by this scheme has not been ful ly investigated. We are presently planning on performing reidentif ication experiment using the software developed by Paass' group. An advantage of the scheme proposed here is, i f users are willing to do multipl ication to get an unbiased estimate of the second moment of the original (unmasked) variables, then we can compact the data points around the mean while the correlation structure is not ha m pered. This can be done by using a small "a" value, as to be seen later. For si m pl ic i ty, the derivation of form ulae is based on the unweighted data. 11.1 Transformation on The Variable to Which Random Noise Was Added
منابع مشابه
A Critique on Power Spectrum – Area Fractal Method for Geochemical Anomaly Mapping
Power spectrum – area fractal (S-A fractal) method has been frequently applied for geochemical anomaly mapping. Some researchers have performed this method for separation of geochemical anomaly, background and noise and have delineated their distribution maps. In this research, surface geochemical data of Zafarghand Cu-Mo mineralization area have been utilized and some defects of S-A fractal me...
متن کاملNew Directions in Anonymization: Permutation Paradigm, Verifiability by Subjects and Intruders, Transparency to Users
There are currently two approaches to anonymization: “utility first” (use an anonymization method with suitable utility features, then empirically evaluate the disclosure risk and, if necessary, reduce the risk by possibly sacrificing some utility) or “privacy first” (enforce a target privacy level via a privacy model, e.g., k-anonymity or ε-differential privacy, without regard to utility). To ...
متن کاملConfidentiality and Disclosure Limitation
confidentiality Broadly, a quality or condition accorded to statistical information as an obligation not to transmit that information to an unauthorized party. contingency table A cross-classified table of counts according to two or more categorical variables. data masking The disclosure limitation process of transforming a data set when there is a specific functional relationship (possibly sto...
متن کاملA Real Time Adaptive Multiresolution Adaptive Wiener Filter Based On Adaptive Neuro-Fuzzy Inference System And Fuzzy evaluation
In this paper, a real-time denoising filter based on modelling of stable hybrid models is presented. Thehybrid models are composed of the shearlet filter and the adaptive Wiener filter in different forms.The optimization of various models is accomplished by the genetic algorithm. Next, regarding thesignificant relationship between Optimal models and input images, changing the structure of Optim...
متن کاملComparing 511 keV Attenuation Maps Obtained from Different Energy Mapping Methods for CT Based Attenuation Correction of PET Data
Introduction: The advent of dual-modality PET/CT scanners has revolutionized clinical oncology by improving lesion localization and facilitating treatment planning for radiotherapy. In addition, the use of CT images for CT-based attenuation correction (CTAC) decreases the overall scanning time and creates a noise-free attenuation map (6map). CTAC methods include scaling, s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002